-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NANOVDB_USE_SYNC_CUDA_MALLOC define to force sync CUDA malloc #1799
base: master
Are you sure you want to change the base?
Add NANOVDB_USE_SYNC_CUDA_MALLOC define to force sync CUDA malloc #1799
Conversation
In virtualized environments that slice up the GPU and share it between instances as vGPU's, GPU unified memory is usually disabled out of security considerations. Asynchronous CUDA malloc/free depends on GPU unified memory, so before, it was not possible to deploy and run NanoVDB code in such environments. This commit adds macros CUDA_MALLOC and CUDA_FREE and replaces all CUDA alloc/free calls with these macros. CUDA_MALLOC and CUDA_FREE expand to asynchronous CUDA malloc & free if the following two conditions are met: - CUDA version needs to be >= 11.2 as this is the first version that supports cudaMallocAsync/cudaMallocFree - NANOVDB_USE_SYNC_CUDA_MALLOC needs to undefined In all other cases, CUDA_MALLOC and CUDA_FREE expand to synchronous cudaMalloc/cudaFree. Since NanoVDB is distributed as header-only, setting the NANOVDB_USE_SYNC_CUDA_MALLOC flag should be handled by the project's build system itself. Signed-off-by: Wouter Bijlsma <[email protected]>
Signed-off-by: Wouter Bijlsma <[email protected]>
2e63b1e
to
4c01b38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great contribution ! However, before I approve it let me try out your fix in the private development branch of NanoVDB. I will sync that repo up with the this (public) repo in the coming week - it includes several changes and improvements :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at closer inspection, why not simply replace this existing line:
#if CUDART_VERSION < 11020
with
#if (CUDART_VERSION < 11020) || defined(NANOVDB_USE_SYNC_CUDA_MALLOC)
This avoids the need to introduce the new macro and also works with existing client code of NanoVDB that may already be using cudaMallocAsync and cudaFreeAsync. I tried it on Linux but not yet on Windows :)
@kmuseth that was the first solution I tried, but it didn’t work, because cudaMalloc and cudaFree calls would still be resolved to the CUDA ones and not the redefined ones from the NanoVDB header. Without any modification to our build I still got CUDA ‘not supported’ errors on allocations. Maybe this can be worked around with some linker directives but that seems brittle and could require possible annoying changes to the build system of projects that include the NanoVDB headers. |
This was on Linux by the way, so it’s interesting it did work on your side, I assume there could be some link differences between our builds. I could double check next week to verify again to be sure and to find out why. |
@w0utert I think you're right so I changed my implementation by simply placing the definitions of the functions in a namespace (nanovdb). So in the end my solution looks very much like yours, except I define functions vs macros. Let me create a PR that includes this fix plus a ton of other (long overdue) improvements to NanoVDB. I'll point you to the relevant changes so you can validate that it does indeed work for you. A warning, this new PR introduces new namespaces in NanoVDB so your client code might need to be tweaked. I can of course help. |
@kmuseth sounds good, that’s actually nicer than using macro’s! I will try your PR early next week and report back, but I’m pretty sure it will work. |
@kmuseth |
@w0utert excellent - so are you okay if we close this PR? |
@kmuseth yes, you can close this PR! |
In virtualized environments that slice up the GPU and share it between instances as vGPU's, GPU unified memory is usually disabled out of security considerations. Asynchronous CUDA malloc/free depends on GPU unified memory, so before, it was not possible to deploy and run NanoVDB code in such environments.
This commit adds macros CUDA_MALLOC and CUDA_FREE and replaces all CUDA alloc/free calls with these macros. CUDA_MALLOC and CUDA_FREE expand to asynchronous CUDA malloc & free if the following two conditions are met:
In all other cases, CUDA_MALLOC and CUDA_FREE expand to synchronous cudaMalloc/cudaFree.
Since NanoVDB is distributed as header-only, setting the NANOVDB_USE_SYNC_CUDA_MALLOC flag should be handled by the project's build system itself.